Introduction

Welcome!
Welcome to the 2023 #DegreesNYC R Course! As you know, #DegreesNYC is on a mission to radically transform the NYC public school system at all levels. Collective impact and youth engagement are pillars of #DegreesNYC’s mission and are the primary tools that the organization uses to further its goals. The folks at #DegreesNYC, however, are aware that the policy space in which they exist values quantitative evidence highly. This course is meant to democratize quantitative expertise, by giving you an introduction to some of the skills you will need to find, manipulate, and analyze data using R. Our hope is that this will not only be enriching for you personally, but will better position you to make policy arguments in the future.

During the course you are going to work primarily in R and much of the work done using R will be done with R Markdown. Don’t worry if you are not familiar with the language or with coding at all; this class and the Codecademy course that will supplement it are meant for beginners. If you do have experience with R and feel that the course is moving too slowly or that you are not being sufficiently challenged here are some options: 1) Add complexity to the examples and challenges presented in class or on Codecademy, 2) read ahead in the syllabus and work on something that is at your level, or 3) focus extra time on your project. Though there are many other languages used for data analysis (including Python, SAS, Stata, and SQL), you will not explore those. If you’d like to know more about one of them feel free to reach out to me and I am happy to share what I know or help point you to a starting place if I am inexperienced in the language myself.

Contact

Feel free to email me at any time at .

About the instructor

Reggie Gilliard came to data analytics and R through an unusual path. He started his career as a teacher in private schools, first at San Francisco Day school and then at Collegiate. Education is now and has always been important to him, but feeling that his impact could be greater outside of the classroom he set off to pursue a master’s degree in education policy.

That degree led Reggie to the Research Alliance for New York City Schools, where his relationship with #DegreesNYC began. That position led him to a PhD at Teachers College, Columbia University in the Economics and Education program. Currently, he is a Data Analytics Developer at Mathematica. He has experience using SAS, SQL, Stata, R, and Python (as well as tools like Excel), but is most well versed in R and Stata.

Outside of the work he does he enjoys reading; playing sports, video games, and board games; trying new restaurants, and generally spending time with his friends and family.

Plan for the course

The course will be split into two units. Unit 1 - Learning and Unit 2 - Project.

Learning

During the Learning Unit you will be focused on working through lessons. Each session will be split into two parts: 1) a fifteen minute mini-lesson delivered by me or another member of the #DegreesNYC team, 2) a Codecademy lesson. The mini-lessons are designed to align as closely as possible with the Codecademy lesson for the week to provide as much reinforcement and clarification on the topics that you will cover as possible.

Project

Unit 2 will be dedicated to a complete analysis of a data set of your choice. For the final six weeks of the course, finishing this project will be the goal. We may still cover some topics briefly to give you additional tools which are beyond the scope of the lessons, but are useful for creating an appealing report, however most of the time will be spent in breakout groups working on your projects. All projects will be undertaken individually, but teams of two will be permitted if there is a compelling reason.

To complete the project you will need to take your data set through the following stages:

  • Research Question Creation

    • Brainstorm some questions that you are curious about and want to spend some time exploring. Many of you are students or are already working in the research field; if it is helpful, bring questions that are relevant for your work or your classes.
  • Data Identification

    • Identify a data set that can be used to address the question that you are interested in. If you have access to a data set that you want to examine, you are encouraged to do that (Note: think carefully about privacy concerns. If you are the only one who should be seeing the data, you may want to create a mock data set that your classmates and I can view so that we can support your project work). If you don’t have access to a data set there are many places that you can go to find a data set to use:

    • NYC Open Data - New York City focused data sets.

    • Data.gov - U.S. data sets.

    • IPEDS - Postsecondary education data sets.

    • NCES - National (or state specific) school data sets.

    • Kaggle - Public data sets on a wide variety of topics (account needed).

  • Data cleaning

    • Use the tools you have learned in the class to clean the data that you’ve downloaded. Some considerations:

      • Are the variable names easy to work with?
      • Is the data tidy (i.e. is every column one variable, every row one observation, and every cell a single value?)
      • Are the variables that you need available in the data set? If not, can you create them?
      • Are the variables the correct type?
  • Data Analysis

    • Provide some basic descriptive statistics on the data set. Use measures such as the mean, median, min, max, or other statistics to describe variables of interest in your data set.
    • If you would like to go beyond standard descriptive analysis, consider using what you learn about ANOVAs on Codecademy to test for significant differences between levels of a categorical variable of interest.
  • Data Visualization

    • Once you have cleaned and analyzed your data, it will be time to visualize your results. ggplot2 is the package that we will use for visualization in this class and the possibilities are broad. The ggplot2 gallery shows the wide range of graphs that you can create using ggplot2 and related packages, and even provides code to use to get started.
    • R Markdown renders the data sets that you create, but it isn’t particularly nice. It is possible to create publication-ready tables using the kable and kableExtra packages. If your analysis includes substantial statistical reporting, use these packages to show your tables to the reader.
    • If you want to challenge yourself, try turning your R Markdown document into an interactive Shiny Dashboard!

I encourage you to use the time in break out rooms collaboratively: share your screens, share code, ask each other questions, and learn from one another. Use your classmates and me as a resource to create something you feel is worthwhile. Although I suggest doing the project alone, this is to maximize learning, not to prevent “cheating”. Any and all collaboration is welcome in this course.

Syllabus

This syllabus will be the repository for everything course-related. Information about lessons, the project, and enrichment is all contained here. The table of contents in the top left has links to the content for Units 1 and 2, the introduction, and enrichment activities. Within the Unit 1 header there are tabs for each of the 6 lessons that you will work through during our time together. These lessons each also have tabs for the topics you will cover, the lesson itself, and any helpful links that you might want to explore. This document will be hosted publicly on GitHub, but it is a living document. If anything changes or if there are problems with the URL leading to the syllabus, I will let you know as soon as possible.

Unit 1 - Learn R

Lesson 1 - An introduction to R (what, why, how)

Topics Covered

  1. What is R?
  2. Why use R?
  3. How do I install R?
  4. The tools of the class.

Lesson

What is R?

  • Fully realized programming language including loops, conditionals, user-defined functions, and several object-oriented systems.
  • Strong data storage, manipulation, and visualization capabilities.
  • Suite of statistical packages and built-in functions.

Why use R?

R is a very common language in data science R is one of the most common languages for data science and data analysis. This is because, with experience, one can take a project from beginning to end only using R. You can download data; manipulate it by creating new variables, dropping missing observations, etc.; perform statistical analysis; and visualize and output results. R, like Stata and SAS, has built-in statistical analysis functionality, but unlike those two languages it is 100% free and open source. There is are a wide range of functions in base R, but the R community is very strong and has built a host of tools that make doing data analysis and visualization much easier than it is in base R (you will be working with one such suite of tools, the tidyverse/dplyr, in this class). If you are hoping to get a job doing analytic work, if you want advanced education in a social science field, if you want an introduction to coding/computer science, or if you are just looking to better understand quantitative analysis, R is a good place to begin.

How do I install R?

To install R you will need about 500mb of free space on your computer (you may actually need less, but freeing up 500 mb ensures you will be able to download the data and packages used in the course). Perform the following series of steps:

  1. Go to The Comprehensive R Archive Network

  2. Click the link to download R for your operating system

  3. Select base

  4. Select the download link at the top of the screen.

You should also install R Studio. Here is how you can do that:

  1. Visit posit.co

  2. Click the download button in the upper right hand corner

  3. Scroll down and select RStudio Desktop. Select download RStudio.

  4. Scroll down and select the appropriate installation for your operating system.

Class tools

As noted in the introduction, this class will use Codecademy to aid in learning basic coding skills in R and the basic R workflow. You will need a Codecademy account to retain your progress between sessions. You can register for one here Another tool you will use throughout the class and one you will also be using in your Codecademy work is a suite of R packages called the Tidyverse. The Tidyverse is designed to make working with data easier and is commonly used in many organizations (we use it at Mathematica, for example). There are eight major packages in the Tidyverse as well as many other specialized packages, but the following are important for this class:

  1. ggplot2 - For visualizing data (i.e. creating graphs, and charts).
  2. dplyr - For cleaning data and provides access to the magrittr pipe (%>%) which you can make use of for many of your data cleaning tasks in this class (although R now has a built-in pipe: |>).
  3. readr - For reading data sets with simple syntax.

To install the Tidyverse, open R or RStudio and type install.packages("tidyverse") into the console.

Finally, you will use R Markdown when creating your projects. R Markdown allows you to write code and text into the same document and have it rendered nicely as a word document, PDF, or html file. I used R Markdown to write this syllabus. Using R Markdown makes integrating visualizations, code, and analyses into your reports simple.

That’s it. You’re all set up!

Getting help in R

While you are in class you can ask me questions and, of course, feel free to email me and other members of the #DegreesNYC team as you need support, but there are other things you can do to get answers before asking me or a classmate for help.

  1. If you are ever unsure what a function does you can type ?functionname or help(functionname) into the console and a help file will pop up (in the bottom right corner of your screen if you are using R Studio) that has useful information about the function.
?paste
  1. If you are curious about an entire package you can type vignette(packagename) for more information about the package. Note that not all packages have a vignette.
vignette("dplyr")
  1. If you need help resolving a more complex problem, try Google or another search engine first. There are many people working in R and many sites which aggregate responses to frequently asked questions. Stackoverflow is one that is well-known in the coding community and that I personally use regularly while at work. It is helpful to investigate solutions to your issues yourself first, because you will retain more information when you’ve had to spend some time searching and iterating on your own.

Install the Course’s exercises

Once you’ve installed R and set up R studio, open R studio, open a new script, and copy and paste the code below to install the exercises that accompany this course.

# Try to install the swirl package
try(install.packages("swirl", repos = "https://cloud.r-project.org/"))

# Uninstall the DegreesNYC_RCourse
try(swirl::uninstall_course("DegreesNYC_RCourse"), silent = T)

# Install the DegreesNYC_RCourse
try(swirl::install_course_github("R-Gilliard-Jr", "DegreesNYC_RCourse", branch = "main"))

Lesson 2 - Variables, data types, and operators

Topics Covered

  1. What are variables and why are they useful?
  2. Assigning variables in R (=, <-, ->)
  3. Data types
    • The six basic data types.
    • Character, numeric, and logical: the three most common types.
  4. Mathematical operators
  5. Logical operators

Lesson

Printing output

There are many ways to output values in R so that you can see them. One of the easiest is just typing the value into the console.

4
## [1] 4
"Hello"
## [1] "Hello"

Another way that is often used is to wrap the item you want to show in the print() function.

print(4)
## [1] 4
print("Hello")
## [1] "Hello"

An introduction to variables

Variables in R are convenient ways to keep track of values. There are times when you will want to perform an operation, save the value, perform some intermediate steps, and then use the original value that you saved. Let’s work with the example of addition. You could do the following to add 4 twice:

4 + 4
## [1] 8

This is perfectly fine. But you could also use variables to add 4 twice like this:

x <- 4
x + x
## [1] 8

Variables allow you to work with values dynamically and without the struggle of trying to remember all of the values that you are working with individually. Keeping the math theme, suppose you wanted to implement the quadratic formula which is written:

And you are provided the following values:

  • a = 2
  • b = 4
  • c = 2

You could do this directly with numbers, as in

(-4 + sqrt(4^2 - 4*2*2))/(2*2)
## [1] -1

Or it could be done using variables, as in

a = 2
b = 4
2 -> c
(-b + sqrt(b^2 - 4*a*c))/(2*a)
## [1] -1

The advantage of the second option is that if the numbers you would like to enter into formula change, all you have to do is change them once, when assigning them, and you can use the formula again. In the first case, you would need to go through and make sure you’ve got all of the numbers entered into their correct locations carefully. The true power of variables will not be seen until you begin to work with for loops, but, for now, think of them as useful ways to store information.

Assigning variables in R

In the examples above I created four different variables: x, a, b, and c. I assigned values to those variables in three different ways: <-, =, ->. You can use any of these assignment operators to assign a value to a variable. To assign a value with the equal sign or the left-arrow the variable name should go the left-hand side of the operator and the value that you want to assign should be on the right-hand side.

x <- 2
print(paste("x equals:", x))
## [1] "x equals: 2"
y = 2023
print(paste("y equals:", y))
## [1] "y equals: 2023"

There is also the right-arrow operator. If using this to assign a value to a variable, the value should be on the left-hand side and the name of the variable should be on the right-hand side.

"Hello" -> z
print(paste("z equals:", z))
## [1] "z equals: Hello"

There is nothing wrong with using any of the assignment operators so you should go with the one you are most comfortable with, especially as you learn. The industry standard, however, is to use the left-arrow unless there is good reason. This prevents confusion when you are trying to do a check for equality (==) and may be easier to read than the right-arrow. I will use the left-arrow exclusively throughout the remainder of this course.

Basic data types

There are 6 basic data types:

  1. Logical
    • True/false data type (Boolean).
  2. Numeric
    • All real numbers.
  3. Integer
    • Real values without decimal points.
  4. Complex
    • Imaginary values
  5. Character
    • String values. These are values that can contain non-numeric characters (such as letters).
  6. Raw
    • Specifies values as raw bytes

In this class you will focus on the logical, numeric, and character data types. These are the three most common types. See the helpful links for more information about the types that you will not cover.

Logical

Data with class logical can only take on the values of true or false. Another name for this type of data is Boolean. In R, TRUE and FALSE are equivalent to 1 and 0 respectively. That means you can do things like add and subtract logical data even though they are not technically numeric.

x <- TRUE
y <- FALSE
z <- TRUE

class(x)
## [1] "logical"
x + y
## [1] 1
x + z
## [1] 2
x - z
## [1] 0
Numeric

The numeric data type holds all real numbers. This means any number, including those with decimals and negatives, but excluding imaginary numbers. We have been working with real numbers throughout this lesson. Examples of real numbers include 4, -25, 1.33333, and pi.

class(4)
## [1] "numeric"

You might wonder how R can tell the difference between a numeric 4 and an integer 4. Mathematically, after all, 4 is both an integer and a real number. The way to pass an explicitly integer value to R is to append an L to the end of the number. For example:

class(4L)
## [1] "integer"
class(1040L)
## [1] "integer"

Appending an L will not convert a number which has a decimal into an integer:

class(1.33L)
## [1] "numeric"

Finally, there are other functions that can convert from numeric to integer for you (and indeed convert between any of the types):

x <- 4
class(x)
## [1] "numeric"
x <- as.integer(x)
class(x)
## [1] "integer"
Character

The character data type holds strings. Strings contain a series of characters. Here are some examples of character data:

class("Hello")
## [1] "character"
class("1234")
## [1] "character"

You will notice that even though the second string (“1234”) contains only numbers R considers it of class character. This is important to remember because character variables and numeric vectors cannot be interacted with in the same ways:

try("1" + "2")
## Error in "1" + "2" : non-numeric argument to binary operator
1 + 2
## [1] 3

The first line of code throws an error, telling us that “1” and “2” are not numeric and therefore cannot be added. The second line of code returns what you expect, 3. Folks coming from other programming languages should note that string concatenation cannot be done with the + operator in R (although you could create an operator that does this yourself).

`%$$%` <- function(lhs, rhs) {
  out <- paste0(lhs, rhs)
  return(out)
}

"Hel" %$$% "lo"
## [1] "Hello"
"How" %$$% " are" %$$% " you?"
## [1] "How are you?"

Mathematical operators

R is designed with statistical analysis in mind. Thus it is easy to do math in R. Most of the mathematical operators are intuitive, but there are some which you may not be familiar with.

Addition, subtraction, multiplication, and division are all straightforward.

2 + 2
## [1] 4
2 - 2
## [1] 0
2 * 2
## [1] 4
2/2
## [1] 1

To exponentiate a number, use either ^ or **

3^2
## [1] 9
3**2
## [1] 9

There are also operators for the modulo of numbers and for integer division. The modulo returns the remainder when dividing two numbers. For example 3 modulo 2 is 1. The modulo operator is %%.

3 %% 2
## [1] 1
6 %% 4
## [1] 2

Integer division returns the integer portion of the result when dividing two numbers. For example 5 integer divided by 2 is 2. The integer division operator is %/%.

5 %/% 2
## [1] 2
7 %/% 6
## [1] 1

Logical operators

You are also likely familiar with most of the logical operators. These are things in math which do not return a number, but rather return a value of TRUE or FALSE. Greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), equal to (==), and not equal to (!=) are all available in R. Note that you must use 2 equal signs in R when you want to equate two things. One equal sign, as we discussed earlier, is for assigning values to variables.

2 < 3
## [1] TRUE
3 > 2
## [1] TRUE
3 == 2
## [1] FALSE
3 != 2
## [1] TRUE
3 <= 2
## [1] FALSE
3 >= 2
## [1] TRUE

There are also operators for or (|), and (&), and not (!).

numlist <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
# Or is one, the other, or both
numlist[numlist < 3 | numlist > 7]
## [1] 1 2 8 9
# And is all
numlist[numlist < 9 & numlist > 7]
## [1] 8
# Not negates whatever follows
numlist[!(numlist >= 5)]
## [1] 1 2 3 4

Lesson 3 - Tidy data and dplyr

Topics Covered

  1. The dplyr package
  2. Working in dplyr
    • The pipe (%>%)
    • Other useful dplyr functions
  3. Tidy data
    • What is tidy data?
    • Why is it useful?

Lesson

Why dplyr?

One of R’s strengths, that it is open source, is also what makes it difficult to learn. There are sometimes many ways to do the same thing with solutions coming from base R, as well as many different packages. For example, suppose you load the built-in R data set mtcars and want to view only cars with mpg > 20:

# Print data set
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Select cars with greater than 20 mpg using base R
mtcars[mtcars$mpg > 20, ]
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Select cars with greater than 20 mpg using dplyr
filter(mtcars, mpg > 20)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

You can see these two solutions are equivalent. Industry standards emerge for precisely this reason–to standardize workflows throughout the industry so that colleagues, collaborators, critics, and competitors can understand each others’ code more easily. dplyr has become the industry standard for much of data analysis in R.

dplyr provides a set of tools that make data cleaning and transformation more intuitive. You can read the package’s vignette for more information. This lesson will explain the 8 functions outlined in the vignette, using the mtcars data set. I will also talk about the pipe and why it is a useful tool for data analysis and creating legible code.

%>% and dplyr functions

%>%

There are eight basic dplyr functions (and many more that are available for more specialized operations), but one of the most useful things in the dplyr package comes from a different package called magrittr: the %>% (pipe) operator.

The pipe operator takes the result on the current line and inserts it into the first argument of the following line. For example 1+1 %>% sum(2) = 4.

1+1 %>%
  sum(2)
## [1] 4

The pipe operator makes workflows cleaner while still being easy to follow by removing the need for many intermediate saving steps. To illustrate, here is how one could multiply mpg by 2 and divide cyl by 3 in both base R and dplyr.

# In base R
# First duplicate the data set
mtcars2 <- mtcars
# Then multiply mpg by 2
mtcars2$mpg <- mtcars$mpg * 2
# Then divide cyl by 3
mtcars2$cyl <- mtcars$cyl/3
print(mtcars2)
##                      mpg      cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           42.0 2.000000 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       42.0 2.000000 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          45.6 1.333333 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      42.8 2.000000 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   37.4 2.666667 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             36.2 2.000000 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          28.6 2.666667 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           48.8 1.333333 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            45.6 1.333333 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            38.4 2.000000 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           35.6 2.000000 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          32.8 2.666667 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          34.6 2.666667 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         30.4 2.666667 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  20.8 2.666667 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 20.8 2.666667 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   29.4 2.666667 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            64.8 1.333333  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         60.8 1.333333  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      67.8 1.333333  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       43.0 1.333333 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    31.0 2.666667 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         30.4 2.666667 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          26.6 2.666667 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    38.4 2.666667 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           54.6 1.333333  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       52.0 1.333333 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        60.8 1.333333  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      31.6 2.666667 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        39.4 2.000000 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       30.0 2.666667 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          42.8 1.333333 121.0 109 4.11 2.780 18.60  1  1    4    2
# In dplyr
mtcars2 <- mtcars %>%
  mutate(mpg = mpg * 2,
         cyl = cyl/3) %>%
  print()
##                      mpg      cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           42.0 2.000000 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       42.0 2.000000 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          45.6 1.333333 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      42.8 2.000000 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   37.4 2.666667 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             36.2 2.000000 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          28.6 2.666667 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           48.8 1.333333 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            45.6 1.333333 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            38.4 2.000000 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           35.6 2.000000 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          32.8 2.666667 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          34.6 2.666667 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         30.4 2.666667 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  20.8 2.666667 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 20.8 2.666667 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   29.4 2.666667 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            64.8 1.333333  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         60.8 1.333333  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      67.8 1.333333  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       43.0 1.333333 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    31.0 2.666667 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         30.4 2.666667 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          26.6 2.666667 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    38.4 2.666667 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           54.6 1.333333  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       52.0 1.333333 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        60.8 1.333333  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      31.6 2.666667 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        39.4 2.000000 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       30.0 2.666667 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          42.8 1.333333 121.0 109 4.11 2.780 18.60  1  1    4    2

Again, the data sets are identical in the end, but in the dplyr case only one assignment operator was necessary. In base R, three were used. Also, in dplyr the data set mtcars is only referenced once and mtcars2 is only referenced when it is assigned. In base R, both data sets are referenced multiple times. The time savings and clarity that can come from using the pipe are not readily apparent from these simple examples, but as your work becomes more complex you will notice incorporating the pipe become more and more valuable.

Two additional notes: 1) For folks who are coming from other oriented programming languages it might be useful to think of the pipe as similar to method chaining. The two are not identical, but the comparison may be useful for understanding why the pipe is worthwhile. 2) More recent versions of R have a built-in pipe which does not have to be loaded from magrittr |>. You can turn that on by going to Tools > Global Options > Code and selecting “Use native pipe operator”.

filter

dplyr’s filter() function allows you to select rows meeting certain criteria. Suppose you are using the mtcars data set and only want to see cars with exactly 6 cylinders. Then with dplyr:

mtcars %>%
  filter(cyl == 6)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

Or suppose you only wanted cars with 4 or more forward gears. Then

mtcars %>%
  filter(gear >= 4)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Combine filter() with logical operators and other functions for more complex selection of observations. Like in this case where the data is subset to only cars made by Mazda:

mtcars %>%
  mutate(make = rownames(mtcars)) %>%
  filter(grepl("Mazda", make))
##               mpg cyl disp  hp drat    wt  qsec vs am gear carb          make
## Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4     Mazda RX4
## Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4 Mazda RX4 Wag
slice

slice() allows selecting rows by index. For example, to select rows 1, 3, and 5 of mtcars:

mtcars %>%
  slice(c(1, 3, 5))
##                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2

Or to select all odd rows:

mtcars %>%
  slice(seq(1, nrow(mtcars), by = 2))
##                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Datsun 710         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet Sportabout  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Duster 360         14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 230           22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SL         17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Chrysler Imperial  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Honda Civic        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corona      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## AMC Javelin        15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Pontiac Firebird   19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Porsche 914-2      26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Ford Pantera L     15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora      15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

There are also helper functions to do things like select the first and last rows quickly:

# First 5 rows
slice_head(mtcars, n = 5)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Last 5 rows
slice_tail(mtcars, n = 5)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2
arrange

arrange() sorts data. You provide a column and, by default, it sorts the rows by that column in ascending order. Compare the following:

# Not arranged
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Data arranged by mpg
mtcars %>%
  arrange(mpg)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

In the first case, the mtcars data set is in some arbitrary order. After using arrange(), the rows are ordered from lowest to highest mpg. Wrapping the target columns in desc() will arrange rows in descending order:

mtcars %>%
  arrange(desc(mpg))
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

Providing multiple columns to arrange will break ties on using succeeding columns:

mtcars %>%
  arrange(mpg)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
mtcars %>%
  arrange(mpg, disp)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

In the second case, Lincoln Continental comes before Cadillac Fleetwood, because Lincoln Continental has the lower disp. Again, desc() could be applied. In this case, Cadillac Fleetwood comes first:

mtcars %>%
  arrange(mpg, desc(disp))
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
select

select() is among the most common dplyr verbs. It has a simple, but important function: selecting which variables should be kept in a data frame. For example, if you only want to keep the mpg variable from the mtcars data frame, you can do the following:

mtcars %>%
  select(mpg)
##                      mpg
## Mazda RX4           21.0
## Mazda RX4 Wag       21.0
## Datsun 710          22.8
## Hornet 4 Drive      21.4
## Hornet Sportabout   18.7
## Valiant             18.1
## Duster 360          14.3
## Merc 240D           24.4
## Merc 230            22.8
## Merc 280            19.2
## Merc 280C           17.8
## Merc 450SE          16.4
## Merc 450SL          17.3
## Merc 450SLC         15.2
## Cadillac Fleetwood  10.4
## Lincoln Continental 10.4
## Chrysler Imperial   14.7
## Fiat 128            32.4
## Honda Civic         30.4
## Toyota Corolla      33.9
## Toyota Corona       21.5
## Dodge Challenger    15.5
## AMC Javelin         15.2
## Camaro Z28          13.3
## Pontiac Firebird    19.2
## Fiat X1-9           27.3
## Porsche 914-2       26.0
## Lotus Europa        30.4
## Ford Pantera L      15.8
## Ferrari Dino        19.7
## Maserati Bora       15.0
## Volvo 142E          21.4

If instead, you’d like to keep multiple variables–say cylinders, gears, and carburetors:

mtcars %>%
  select(mpg, gear, carb)
##                      mpg gear carb
## Mazda RX4           21.0    4    4
## Mazda RX4 Wag       21.0    4    4
## Datsun 710          22.8    4    1
## Hornet 4 Drive      21.4    3    1
## Hornet Sportabout   18.7    3    2
## Valiant             18.1    3    1
## Duster 360          14.3    3    4
## Merc 240D           24.4    4    2
## Merc 230            22.8    4    2
## Merc 280            19.2    4    4
## Merc 280C           17.8    4    4
## Merc 450SE          16.4    3    3
## Merc 450SL          17.3    3    3
## Merc 450SLC         15.2    3    3
## Cadillac Fleetwood  10.4    3    4
## Lincoln Continental 10.4    3    4
## Chrysler Imperial   14.7    3    4
## Fiat 128            32.4    4    1
## Honda Civic         30.4    4    2
## Toyota Corolla      33.9    4    1
## Toyota Corona       21.5    3    1
## Dodge Challenger    15.5    3    2
## AMC Javelin         15.2    3    2
## Camaro Z28          13.3    3    4
## Pontiac Firebird    19.2    3    2
## Fiat X1-9           27.3    4    1
## Porsche 914-2       26.0    5    2
## Lotus Europa        30.4    5    2
## Ford Pantera L      15.8    5    4
## Ferrari Dino        19.7    5    6
## Maserati Bora       15.0    5    8
## Volvo 142E          21.4    4    2
mtcars %>%
  select(c("mpg", "gear", "carb"))
##                      mpg gear carb
## Mazda RX4           21.0    4    4
## Mazda RX4 Wag       21.0    4    4
## Datsun 710          22.8    4    1
## Hornet 4 Drive      21.4    3    1
## Hornet Sportabout   18.7    3    2
## Valiant             18.1    3    1
## Duster 360          14.3    3    4
## Merc 240D           24.4    4    2
## Merc 230            22.8    4    2
## Merc 280            19.2    4    4
## Merc 280C           17.8    4    4
## Merc 450SE          16.4    3    3
## Merc 450SL          17.3    3    3
## Merc 450SLC         15.2    3    3
## Cadillac Fleetwood  10.4    3    4
## Lincoln Continental 10.4    3    4
## Chrysler Imperial   14.7    3    4
## Fiat 128            32.4    4    1
## Honda Civic         30.4    4    2
## Toyota Corolla      33.9    4    1
## Toyota Corona       21.5    3    1
## Dodge Challenger    15.5    3    2
## AMC Javelin         15.2    3    2
## Camaro Z28          13.3    3    4
## Pontiac Firebird    19.2    3    2
## Fiat X1-9           27.3    4    1
## Porsche 914-2       26.0    5    2
## Lotus Europa        30.4    5    2
## Ford Pantera L      15.8    5    4
## Ferrari Dino        19.7    5    6
## Maserati Bora       15.0    5    8
## Volvo 142E          21.4    4    2
mtcars %>%
  select(c(mpg, gear, carb))
##                      mpg gear carb
## Mazda RX4           21.0    4    4
## Mazda RX4 Wag       21.0    4    4
## Datsun 710          22.8    4    1
## Hornet 4 Drive      21.4    3    1
## Hornet Sportabout   18.7    3    2
## Valiant             18.1    3    1
## Duster 360          14.3    3    4
## Merc 240D           24.4    4    2
## Merc 230            22.8    4    2
## Merc 280            19.2    4    4
## Merc 280C           17.8    4    4
## Merc 450SE          16.4    3    3
## Merc 450SL          17.3    3    3
## Merc 450SLC         15.2    3    3
## Cadillac Fleetwood  10.4    3    4
## Lincoln Continental 10.4    3    4
## Chrysler Imperial   14.7    3    4
## Fiat 128            32.4    4    1
## Honda Civic         30.4    4    2
## Toyota Corolla      33.9    4    1
## Toyota Corona       21.5    3    1
## Dodge Challenger    15.5    3    2
## AMC Javelin         15.2    3    2
## Camaro Z28          13.3    3    4
## Pontiac Firebird    19.2    3    2
## Fiat X1-9           27.3    4    1
## Porsche 914-2       26.0    5    2
## Lotus Europa        30.4    5    2
## Ford Pantera L      15.8    5    4
## Ferrari Dino        19.7    5    6
## Maserati Bora       15.0    5    8
## Volvo 142E          21.4    4    2

As shown above, columns can be provided as a comma separated list, a character vector, or a vector of objects. This is especially useful if, instead, you want to remove some variables from the data frame:

mtcars %>%
  select(-c("mpg", "gear", "carb"))
##                     cyl  disp  hp drat    wt  qsec vs am
## Mazda RX4             6 160.0 110 3.90 2.620 16.46  0  1
## Mazda RX4 Wag         6 160.0 110 3.90 2.875 17.02  0  1
## Datsun 710            4 108.0  93 3.85 2.320 18.61  1  1
## Hornet 4 Drive        6 258.0 110 3.08 3.215 19.44  1  0
## Hornet Sportabout     8 360.0 175 3.15 3.440 17.02  0  0
## Valiant               6 225.0 105 2.76 3.460 20.22  1  0
## Duster 360            8 360.0 245 3.21 3.570 15.84  0  0
## Merc 240D             4 146.7  62 3.69 3.190 20.00  1  0
## Merc 230              4 140.8  95 3.92 3.150 22.90  1  0
## Merc 280              6 167.6 123 3.92 3.440 18.30  1  0
## Merc 280C             6 167.6 123 3.92 3.440 18.90  1  0
## Merc 450SE            8 275.8 180 3.07 4.070 17.40  0  0
## Merc 450SL            8 275.8 180 3.07 3.730 17.60  0  0
## Merc 450SLC           8 275.8 180 3.07 3.780 18.00  0  0
## Cadillac Fleetwood    8 472.0 205 2.93 5.250 17.98  0  0
## Lincoln Continental   8 460.0 215 3.00 5.424 17.82  0  0
## Chrysler Imperial     8 440.0 230 3.23 5.345 17.42  0  0
## Fiat 128              4  78.7  66 4.08 2.200 19.47  1  1
## Honda Civic           4  75.7  52 4.93 1.615 18.52  1  1
## Toyota Corolla        4  71.1  65 4.22 1.835 19.90  1  1
## Toyota Corona         4 120.1  97 3.70 2.465 20.01  1  0
## Dodge Challenger      8 318.0 150 2.76 3.520 16.87  0  0
## AMC Javelin           8 304.0 150 3.15 3.435 17.30  0  0
## Camaro Z28            8 350.0 245 3.73 3.840 15.41  0  0
## Pontiac Firebird      8 400.0 175 3.08 3.845 17.05  0  0
## Fiat X1-9             4  79.0  66 4.08 1.935 18.90  1  1
## Porsche 914-2         4 120.3  91 4.43 2.140 16.70  0  1
## Lotus Europa          4  95.1 113 3.77 1.513 16.90  1  1
## Ford Pantera L        8 351.0 264 4.22 3.170 14.50  0  1
## Ferrari Dino          6 145.0 175 3.62 2.770 15.50  0  1
## Maserati Bora         8 301.0 335 3.54 3.570 14.60  0  1
## Volvo 142E            4 121.0 109 4.11 2.780 18.60  1  1

Now the variables of interest have been removed.

rename

Sometimes column names are inconvenient to work with, not descriptive, or generally do not meet your needs. rename() can remedy this problem; use it when you want to rename a column in a data frame:

mtcars %>%
  rename(MilesPerGallon = mpg)
##                     MilesPerGallon cyl  disp  hp drat    wt  qsec vs am gear
## Mazda RX4                     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
## Mazda RX4 Wag                 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
## Datsun 710                    22.8   4 108.0  93 3.85 2.320 18.61  1  1    4
## Hornet 4 Drive                21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
## Hornet Sportabout             18.7   8 360.0 175 3.15 3.440 17.02  0  0    3
## Valiant                       18.1   6 225.0 105 2.76 3.460 20.22  1  0    3
## Duster 360                    14.3   8 360.0 245 3.21 3.570 15.84  0  0    3
## Merc 240D                     24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
## Merc 230                      22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
## Merc 280                      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
## Merc 280C                     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
## Merc 450SE                    16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
## Merc 450SL                    17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
## Merc 450SLC                   15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
## Cadillac Fleetwood            10.4   8 472.0 205 2.93 5.250 17.98  0  0    3
## Lincoln Continental           10.4   8 460.0 215 3.00 5.424 17.82  0  0    3
## Chrysler Imperial             14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
## Fiat 128                      32.4   4  78.7  66 4.08 2.200 19.47  1  1    4
## Honda Civic                   30.4   4  75.7  52 4.93 1.615 18.52  1  1    4
## Toyota Corolla                33.9   4  71.1  65 4.22 1.835 19.90  1  1    4
## Toyota Corona                 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
## Dodge Challenger              15.5   8 318.0 150 2.76 3.520 16.87  0  0    3
## AMC Javelin                   15.2   8 304.0 150 3.15 3.435 17.30  0  0    3
## Camaro Z28                    13.3   8 350.0 245 3.73 3.840 15.41  0  0    3
## Pontiac Firebird              19.2   8 400.0 175 3.08 3.845 17.05  0  0    3
## Fiat X1-9                     27.3   4  79.0  66 4.08 1.935 18.90  1  1    4
## Porsche 914-2                 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5
## Lotus Europa                  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5
## Ford Pantera L                15.8   8 351.0 264 4.22 3.170 14.50  0  1    5
## Ferrari Dino                  19.7   6 145.0 175 3.62 2.770 15.50  0  1    5
## Maserati Bora                 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5
## Volvo 142E                    21.4   4 121.0 109 4.11 2.780 18.60  1  1    4
##                     carb
## Mazda RX4              4
## Mazda RX4 Wag          4
## Datsun 710             1
## Hornet 4 Drive         1
## Hornet Sportabout      2
## Valiant                1
## Duster 360             4
## Merc 240D              2
## Merc 230               2
## Merc 280               4
## Merc 280C              4
## Merc 450SE             3
## Merc 450SL             3
## Merc 450SLC            3
## Cadillac Fleetwood     4
## Lincoln Continental    4
## Chrysler Imperial      4
## Fiat 128               1
## Honda Civic            2
## Toyota Corolla         1
## Toyota Corona          1
## Dodge Challenger       2
## AMC Javelin            2
## Camaro Z28             4
## Pontiac Firebird       2
## Fiat X1-9              1
## Porsche 914-2          2
## Lotus Europa           2
## Ford Pantera L         4
## Ferrari Dino           6
## Maserati Bora          8
## Volvo 142E             2

The new variable name is entered on the left-hand side of the equation, the old variable name on the right hand side. Multiple columns can be renamed at once by separating each expression with a comma:

mtcars %>%
  rename(MilesPerGallon = mpg,
         Cylinders = cyl,
         Carburetors = carb)
##                     MilesPerGallon Cylinders  disp  hp drat    wt  qsec vs am
## Mazda RX4                     21.0         6 160.0 110 3.90 2.620 16.46  0  1
## Mazda RX4 Wag                 21.0         6 160.0 110 3.90 2.875 17.02  0  1
## Datsun 710                    22.8         4 108.0  93 3.85 2.320 18.61  1  1
## Hornet 4 Drive                21.4         6 258.0 110 3.08 3.215 19.44  1  0
## Hornet Sportabout             18.7         8 360.0 175 3.15 3.440 17.02  0  0
## Valiant                       18.1         6 225.0 105 2.76 3.460 20.22  1  0
## Duster 360                    14.3         8 360.0 245 3.21 3.570 15.84  0  0
## Merc 240D                     24.4         4 146.7  62 3.69 3.190 20.00  1  0
## Merc 230                      22.8         4 140.8  95 3.92 3.150 22.90  1  0
## Merc 280                      19.2         6 167.6 123 3.92 3.440 18.30  1  0
## Merc 280C                     17.8         6 167.6 123 3.92 3.440 18.90  1  0
## Merc 450SE                    16.4         8 275.8 180 3.07 4.070 17.40  0  0
## Merc 450SL                    17.3         8 275.8 180 3.07 3.730 17.60  0  0
## Merc 450SLC                   15.2         8 275.8 180 3.07 3.780 18.00  0  0
## Cadillac Fleetwood            10.4         8 472.0 205 2.93 5.250 17.98  0  0
## Lincoln Continental           10.4         8 460.0 215 3.00 5.424 17.82  0  0
## Chrysler Imperial             14.7         8 440.0 230 3.23 5.345 17.42  0  0
## Fiat 128                      32.4         4  78.7  66 4.08 2.200 19.47  1  1
## Honda Civic                   30.4         4  75.7  52 4.93 1.615 18.52  1  1
## Toyota Corolla                33.9         4  71.1  65 4.22 1.835 19.90  1  1
## Toyota Corona                 21.5         4 120.1  97 3.70 2.465 20.01  1  0
## Dodge Challenger              15.5         8 318.0 150 2.76 3.520 16.87  0  0
## AMC Javelin                   15.2         8 304.0 150 3.15 3.435 17.30  0  0
## Camaro Z28                    13.3         8 350.0 245 3.73 3.840 15.41  0  0
## Pontiac Firebird              19.2         8 400.0 175 3.08 3.845 17.05  0  0
## Fiat X1-9                     27.3         4  79.0  66 4.08 1.935 18.90  1  1
## Porsche 914-2                 26.0         4 120.3  91 4.43 2.140 16.70  0  1
## Lotus Europa                  30.4         4  95.1 113 3.77 1.513 16.90  1  1
## Ford Pantera L                15.8         8 351.0 264 4.22 3.170 14.50  0  1
## Ferrari Dino                  19.7         6 145.0 175 3.62 2.770 15.50  0  1
## Maserati Bora                 15.0         8 301.0 335 3.54 3.570 14.60  0  1
## Volvo 142E                    21.4         4 121.0 109 4.11 2.780 18.60  1  1
##                     gear Carburetors
## Mazda RX4              4           4
## Mazda RX4 Wag          4           4
## Datsun 710             4           1
## Hornet 4 Drive         3           1
## Hornet Sportabout      3           2
## Valiant                3           1
## Duster 360             3           4
## Merc 240D              4           2
## Merc 230               4           2
## Merc 280               4           4
## Merc 280C              4           4
## Merc 450SE             3           3
## Merc 450SL             3           3
## Merc 450SLC            3           3
## Cadillac Fleetwood     3           4
## Lincoln Continental    3           4
## Chrysler Imperial      3           4
## Fiat 128               4           1
## Honda Civic            4           2
## Toyota Corolla         4           1
## Toyota Corona          3           1
## Dodge Challenger       3           2
## AMC Javelin            3           2
## Camaro Z28             3           4
## Pontiac Firebird       3           2
## Fiat X1-9              4           1
## Porsche 914-2          5           2
## Lotus Europa           5           2
## Ford Pantera L         5           4
## Ferrari Dino           5           6
## Maserati Bora          5           8
## Volvo 142E             4           2

R has rules about how variables can be named (see help(make.names) for more information). If you would really like to (for example if you are creating a public-facing table), you can create non-syntactic names by wrapping them in backticks:

mtcars %>%
  rename(`Miles Per Gallon` = mpg)
##                     Miles Per Gallon cyl  disp  hp drat    wt  qsec vs am gear
## Mazda RX4                       21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
## Mazda RX4 Wag                   21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
## Datsun 710                      22.8   4 108.0  93 3.85 2.320 18.61  1  1    4
## Hornet 4 Drive                  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
## Hornet Sportabout               18.7   8 360.0 175 3.15 3.440 17.02  0  0    3
## Valiant                         18.1   6 225.0 105 2.76 3.460 20.22  1  0    3
## Duster 360                      14.3   8 360.0 245 3.21 3.570 15.84  0  0    3
## Merc 240D                       24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
## Merc 230                        22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
## Merc 280                        19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
## Merc 280C                       17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
## Merc 450SE                      16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
## Merc 450SL                      17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
## Merc 450SLC                     15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
## Cadillac Fleetwood              10.4   8 472.0 205 2.93 5.250 17.98  0  0    3
## Lincoln Continental             10.4   8 460.0 215 3.00 5.424 17.82  0  0    3
## Chrysler Imperial               14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
## Fiat 128                        32.4   4  78.7  66 4.08 2.200 19.47  1  1    4
## Honda Civic                     30.4   4  75.7  52 4.93 1.615 18.52  1  1    4
## Toyota Corolla                  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4
## Toyota Corona                   21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
## Dodge Challenger                15.5   8 318.0 150 2.76 3.520 16.87  0  0    3
## AMC Javelin                     15.2   8 304.0 150 3.15 3.435 17.30  0  0    3
## Camaro Z28                      13.3   8 350.0 245 3.73 3.840 15.41  0  0    3
## Pontiac Firebird                19.2   8 400.0 175 3.08 3.845 17.05  0  0    3
## Fiat X1-9                       27.3   4  79.0  66 4.08 1.935 18.90  1  1    4
## Porsche 914-2                   26.0   4 120.3  91 4.43 2.140 16.70  0  1    5
## Lotus Europa                    30.4   4  95.1 113 3.77 1.513 16.90  1  1    5
## Ford Pantera L                  15.8   8 351.0 264 4.22 3.170 14.50  0  1    5
## Ferrari Dino                    19.7   6 145.0 175 3.62 2.770 15.50  0  1    5
## Maserati Bora                   15.0   8 301.0 335 3.54 3.570 14.60  0  1    5
## Volvo 142E                      21.4   4 121.0 109 4.11 2.780 18.60  1  1    4
##                     carb
## Mazda RX4              4
## Mazda RX4 Wag          4
## Datsun 710             1
## Hornet 4 Drive         1
## Hornet Sportabout      2
## Valiant                1
## Duster 360             4
## Merc 240D              2
## Merc 230               2
## Merc 280               4
## Merc 280C              4
## Merc 450SE             3
## Merc 450SL             3
## Merc 450SLC            3
## Cadillac Fleetwood     4
## Lincoln Continental    4
## Chrysler Imperial      4
## Fiat 128               1
## Honda Civic            2
## Toyota Corolla         1
## Toyota Corona          1
## Dodge Challenger       2
## AMC Javelin            2
## Camaro Z28             4
## Pontiac Firebird       2
## Fiat X1-9              1
## Porsche 914-2          2
## Lotus Europa           2
## Ford Pantera L         4
## Ferrari Dino           6
## Maserati Bora          8
## Volvo 142E             2
mutate

mutate() allows you to modify and create columns in a data frame. You use mutate by writing the column name you’d like to modify/create on the left-hand side and a value on the right-hand side of an expression. For example you can calculate the mpg to cyl ratio in mtcars:

mtcars %>%
  mutate(mpg_to_cyl = mpg/cyl)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     mpg_to_cyl
## Mazda RX4             3.500000
## Mazda RX4 Wag         3.500000
## Datsun 710            5.700000
## Hornet 4 Drive        3.566667
## Hornet Sportabout     2.337500
## Valiant               3.016667
## Duster 360            1.787500
## Merc 240D             6.100000
## Merc 230              5.700000
## Merc 280              3.200000
## Merc 280C             2.966667
## Merc 450SE            2.050000
## Merc 450SL            2.162500
## Merc 450SLC           1.900000
## Cadillac Fleetwood    1.300000
## Lincoln Continental   1.300000
## Chrysler Imperial     1.837500
## Fiat 128              8.100000
## Honda Civic           7.600000
## Toyota Corolla        8.475000
## Toyota Corona         5.375000
## Dodge Challenger      1.937500
## AMC Javelin           1.900000
## Camaro Z28            1.662500
## Pontiac Firebird      2.400000
## Fiat X1-9             6.825000
## Porsche 914-2         6.500000
## Lotus Europa          7.600000
## Ford Pantera L        1.975000
## Ferrari Dino          3.283333
## Maserati Bora         1.875000
## Volvo 142E            5.350000

As with rename(), if you’d like to manipulate multiple variables in one call to mutate() you can do that by separating expressions with a comma:

mtcars %>%
  mutate(mpg_to_cyl = mpg/cyl,
         mpg_to_carb = mpg/carb,
         carb_to_cyl = mpg_to_cyl/mpg_to_carb)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     mpg_to_cyl mpg_to_carb carb_to_cyl
## Mazda RX4             3.500000    5.250000   0.6666667
## Mazda RX4 Wag         3.500000    5.250000   0.6666667
## Datsun 710            5.700000   22.800000   0.2500000
## Hornet 4 Drive        3.566667   21.400000   0.1666667
## Hornet Sportabout     2.337500    9.350000   0.2500000
## Valiant               3.016667   18.100000   0.1666667
## Duster 360            1.787500    3.575000   0.5000000
## Merc 240D             6.100000   12.200000   0.5000000
## Merc 230              5.700000   11.400000   0.5000000
## Merc 280              3.200000    4.800000   0.6666667
## Merc 280C             2.966667    4.450000   0.6666667
## Merc 450SE            2.050000    5.466667   0.3750000
## Merc 450SL            2.162500    5.766667   0.3750000
## Merc 450SLC           1.900000    5.066667   0.3750000
## Cadillac Fleetwood    1.300000    2.600000   0.5000000
## Lincoln Continental   1.300000    2.600000   0.5000000
## Chrysler Imperial     1.837500    3.675000   0.5000000
## Fiat 128              8.100000   32.400000   0.2500000
## Honda Civic           7.600000   15.200000   0.5000000
## Toyota Corolla        8.475000   33.900000   0.2500000
## Toyota Corona         5.375000   21.500000   0.2500000
## Dodge Challenger      1.937500    7.750000   0.2500000
## AMC Javelin           1.900000    7.600000   0.2500000
## Camaro Z28            1.662500    3.325000   0.5000000
## Pontiac Firebird      2.400000    9.600000   0.2500000
## Fiat X1-9             6.825000   27.300000   0.2500000
## Porsche 914-2         6.500000   13.000000   0.5000000
## Lotus Europa          7.600000   15.200000   0.5000000
## Ford Pantera L        1.975000    3.950000   0.5000000
## Ferrari Dino          3.283333    3.283333   1.0000000
## Maserati Bora         1.875000    1.875000   1.0000000
## Volvo 142E            5.350000   10.700000   0.5000000

Sometimes you want to apply the same transformation to multiple columns at once. There are several ways to do this using dplyr, but a common one is the function across(). across() is especially useful when summarizing, but can also be used in conjunction with mutate. For example, suppose you wanted to calculate the mean of cyl, mpg, and carb:

mtcars %>%
  mutate(across(c(cyl, mpg, carb), mean, na.rm = T)) %>%
  select(mpg, cyl, carb)
##                          mpg    cyl   carb
## Mazda RX4           20.09062 6.1875 2.8125
## Mazda RX4 Wag       20.09062 6.1875 2.8125
## Datsun 710          20.09062 6.1875 2.8125
## Hornet 4 Drive      20.09062 6.1875 2.8125
## Hornet Sportabout   20.09062 6.1875 2.8125
## Valiant             20.09062 6.1875 2.8125
## Duster 360          20.09062 6.1875 2.8125
## Merc 240D           20.09062 6.1875 2.8125
## Merc 230            20.09062 6.1875 2.8125
## Merc 280            20.09062 6.1875 2.8125
## Merc 280C           20.09062 6.1875 2.8125
## Merc 450SE          20.09062 6.1875 2.8125
## Merc 450SL          20.09062 6.1875 2.8125
## Merc 450SLC         20.09062 6.1875 2.8125
## Cadillac Fleetwood  20.09062 6.1875 2.8125
## Lincoln Continental 20.09062 6.1875 2.8125
## Chrysler Imperial   20.09062 6.1875 2.8125
## Fiat 128            20.09062 6.1875 2.8125
## Honda Civic         20.09062 6.1875 2.8125
## Toyota Corolla      20.09062 6.1875 2.8125
## Toyota Corona       20.09062 6.1875 2.8125
## Dodge Challenger    20.09062 6.1875 2.8125
## AMC Javelin         20.09062 6.1875 2.8125
## Camaro Z28          20.09062 6.1875 2.8125
## Pontiac Firebird    20.09062 6.1875 2.8125
## Fiat X1-9           20.09062 6.1875 2.8125
## Porsche 914-2       20.09062 6.1875 2.8125
## Lotus Europa        20.09062 6.1875 2.8125
## Ford Pantera L      20.09062 6.1875 2.8125
## Ferrari Dino        20.09062 6.1875 2.8125
## Maserati Bora       20.09062 6.1875 2.8125
## Volvo 142E          20.09062 6.1875 2.8125

This example is not particularly useful, as you would probably never want to imply that all car makes have the same mpg, cyl, and carb, but it illustrates the power of across.

relocate

relocate() allows you to move the location of columns in a data frame. You provide a series of columns that you would like to move and a column you would like to move them in front of (with .before) or behind (with .after). For example, to move the disp and hp columns before the mpg column you could do the following:

mtcars %>%
  relocate(disp, hp, .before = mpg)
##                      disp  hp  mpg cyl drat    wt  qsec vs am gear carb
## Mazda RX4           160.0 110 21.0   6 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       160.0 110 21.0   6 3.90 2.875 17.02  0  1    4    4
## Datsun 710          108.0  93 22.8   4 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      258.0 110 21.4   6 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   360.0 175 18.7   8 3.15 3.440 17.02  0  0    3    2
## Valiant             225.0 105 18.1   6 2.76 3.460 20.22  1  0    3    1
## Duster 360          360.0 245 14.3   8 3.21 3.570 15.84  0  0    3    4
## Merc 240D           146.7  62 24.4   4 3.69 3.190 20.00  1  0    4    2
## Merc 230            140.8  95 22.8   4 3.92 3.150 22.90  1  0    4    2
## Merc 280            167.6 123 19.2   6 3.92 3.440 18.30  1  0    4    4
## Merc 280C           167.6 123 17.8   6 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          275.8 180 16.4   8 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          275.8 180 17.3   8 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         275.8 180 15.2   8 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  472.0 205 10.4   8 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 460.0 215 10.4   8 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   440.0 230 14.7   8 3.23 5.345 17.42  0  0    3    4
## Fiat 128             78.7  66 32.4   4 4.08 2.200 19.47  1  1    4    1
## Honda Civic          75.7  52 30.4   4 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla       71.1  65 33.9   4 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       120.1  97 21.5   4 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    318.0 150 15.5   8 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         304.0 150 15.2   8 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          350.0 245 13.3   8 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    400.0 175 19.2   8 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9            79.0  66 27.3   4 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       120.3  91 26.0   4 4.43 2.140 16.70  0  1    5    2
## Lotus Europa         95.1 113 30.4   4 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      351.0 264 15.8   8 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        145.0 175 19.7   6 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       301.0 335 15.0   8 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          121.0 109 21.4   4 4.11 2.780 18.60  1  1    4    2

or

mtcars %>%
  relocate(mpg, cyl, .after = hp)
##                      disp  hp  mpg cyl drat    wt  qsec vs am gear carb
## Mazda RX4           160.0 110 21.0   6 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       160.0 110 21.0   6 3.90 2.875 17.02  0  1    4    4
## Datsun 710          108.0  93 22.8   4 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      258.0 110 21.4   6 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   360.0 175 18.7   8 3.15 3.440 17.02  0  0    3    2
## Valiant             225.0 105 18.1   6 2.76 3.460 20.22  1  0    3    1
## Duster 360          360.0 245 14.3   8 3.21 3.570 15.84  0  0    3    4
## Merc 240D           146.7  62 24.4   4 3.69 3.190 20.00  1  0    4    2
## Merc 230            140.8  95 22.8   4 3.92 3.150 22.90  1  0    4    2
## Merc 280            167.6 123 19.2   6 3.92 3.440 18.30  1  0    4    4
## Merc 280C           167.6 123 17.8   6 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          275.8 180 16.4   8 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          275.8 180 17.3   8 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         275.8 180 15.2   8 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  472.0 205 10.4   8 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 460.0 215 10.4   8 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   440.0 230 14.7   8 3.23 5.345 17.42  0  0    3    4
## Fiat 128             78.7  66 32.4   4 4.08 2.200 19.47  1  1    4    1
## Honda Civic          75.7  52 30.4   4 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla       71.1  65 33.9   4 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       120.1  97 21.5   4 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    318.0 150 15.5   8 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         304.0 150 15.2   8 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          350.0 245 13.3   8 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    400.0 175 19.2   8 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9            79.0  66 27.3   4 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       120.3  91 26.0   4 4.43 2.140 16.70  0  1    5    2
## Lotus Europa         95.1 113 30.4   4 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      351.0 264 15.8   8 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        145.0 175 19.7   6 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       301.0 335 15.0   8 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          121.0 109 21.4   4 4.11 2.780 18.60  1  1    4    2

relocate() is especially useful when outputting tables (where the order of variables is important).

summarize

summarize() (or summarise()) collapses data given a set of grouping variables created by group_by() or other means. The most common use of summarize() is for calculating summary statistics:

# Group mtcars by mpg and calculate the mean cylinders and max carburetors within those groups.
mtcars %>%
  group_by(mpg) %>%
  summarize(cyl = mean(cyl),
            carb = max(carb))
## # A tibble: 25 × 3
##      mpg   cyl  carb
##    <dbl> <dbl> <dbl>
##  1  10.4     8     4
##  2  13.3     8     4
##  3  14.3     8     4
##  4  14.7     8     4
##  5  15       8     8
##  6  15.2     8     3
##  7  15.5     8     2
##  8  15.8     8     4
##  9  16.4     8     3
## 10  17.3     8     3
## # … with 15 more rows

But since summarize() applies functions within groups, it can be used for other things, such as grouping text:

# Create mock data
ids <- c(1, 1, 2, 2)
text <- c("Hi", "I'm", "Reggie", "G.")
text_data <- as.data.frame(list(id = ids, text = text))
head(text_data)
##   id   text
## 1  1     Hi
## 2  1    I'm
## 3  2 Reggie
## 4  2     G.
# Combine text by ID
text_data %>%
  group_by(id) %>%
  summarize(text = paste(text, collapse = " "))
## # A tibble: 2 × 2
##      id text     
##   <dbl> <chr>    
## 1     1 Hi I'm   
## 2     2 Reggie G.

This can come in handy in a variety of contexts such as working with survey data or parsing a pdf.

Tidy data

In general, data sets that you come across while working or doing research will be messy. They will contain unexpected values, missingness, variables that are unclear, and other problems. To help make the irregularities of data more apparent and to shape data into a form that is easily analyzed, statisticians and programmers have created a set of rules for data collectively giving rise to the idea of tidy data. There are three important things to remember about tidy data (see vignette("tidy-data"), from which these rules are drawn):

  1. Every column is a variable.
  2. Every row is an observation.
  3. Every cell is a single value.

As you begin to work with more sophisticated data sets, keep these rules in mind. Use the tools that you learn in this course and beyond to mold the data into a tidy data frame. Reshaping and massaging data may seem like time wasted working on something other than your analysis, but the time you spend organizing your data will be paid for by a more straightforward analytic process.

Lesson 4 - Vectors and lists

Topics Covered

  1. What is a vector, what is a list?
    • What is the difference?
  2. Why are vectors useful?
  3. Vectorized operations

Lesson 5 - Conditionals and loops

Topics Covered

  1. If/then conditionals and their use
    • When to use them
  2. While loops
    • When to use
  3. For loops
    • When to use

Lesson 6 - Functions

  1. What is a function?
  2. When to use functions.
  3. Writing functions in R.